mmg_233_2013_genetics_genomicswikiaorg-20200214-history
Transposons, Genome Evolution and Deja vu
Transposons or transposable elements are segments of DNA that are self replicating and insert themselves into different parts of the genome (1,2). The insertion of of transposons into the genome can lead to the disruption of genes or the alteration of their expression levels. Transpositions can also cause chromosomal translocations that can lead to gene duplication. Under conditions where there are two copies of the same gene in an organism, the two copies can undergo evolution at different rates.In such a case,one copy can further mutate to yield a gene product with a different function that is beneficial to the organism without losing the function of the other gene copy. Types of Transposons Class I- retrotransposons that move through RNA intermediates. These RNA intermediates are then used as templates to encode the DNA transposon at the site on insertion using a reverse transcriptase (3). Class II- are also known as DNA transposons and encode their own trasposase enzyme for their ligation and insertion. This mode of transposition is referred to as a cut and paste mechanism in which the transposon is excised and inserted in to the new target site (4). Gaps reulting from this process are then filled by complementation of an existing overhang produced during the excision process and ligated . The movement of transposons throughout the genome is tightly regulated by mechanisms that are still iunder investigation. Ttransposons and genome eveolution In the same way that transposons can disrupt genes or cause gene duplication by inserting themselves into genes or copying over adjacent sequences of genes, they can lead to genome evolution. As the incorporated or duplicated sequences are transcribed and exhibit functions that are beneficial to the organism, selective pressures in its environment may lead to selection of teh duplication event or a fusion gene formed in the transposition process that leads to the evolution of the host organism (5). Such process could be responsible for the evolution of muliprotein domains as described in the study below. De ja vu The majority of eukaryotic proteins have at least two domains and sometimes more. These domains usually evolve independently have specific functions of their own. The types of signaling and metabolic pathways that are important for a particular organism are dependent on its unique repository of multidomain proteins. Novel domain combinations can emrge through gene duplication or fission events leading to the formation of proteins with a gain of function or new proteins altogether. Such changes can affect the metabolic capacity or signaling netwaroks that an organism possesses leading to its survival and evolution. In the study described below, Zmasek and Godzik (6) examined teh complete genomes of 172 eukaryotes to determine the evolutionary history of domains observed in these species and the evolutionary relationships between them. Their study led to the discovery of species specific core domains and also showed that 25% os all currently known protein domains have eveolved a number of times in different species and this parallel evolution has greatly influenced the repetoire of domains present in different organisms. Results: The 172 eukaryotic genomes studied represented five of the eukaryotic supergroups in the current modelof the tree of life and are represented in figure 1. Members of the Opisthokonta group were overrepresented in the genomes analyzed (112 genomes) while the ameobeoza was least represented with genomes from only three members studied. The proteins from all the genomes were analyzed using the HMMER3 programs for the discovery of protein domains based on the Pfam database definition. The coverage of the domains identified was quite wide and outliers were present at both the high (95th percentile) and low (40th percentile) ends of the range. These domains are presented in table 1. Due to the large proportion of unique protein domains present in the cillate, Tetrahymena thermophilus, the protozoan Plasmodium chabaudi and the parasite Trichomonas vaginalis which are grossly under-represented in the Pfam database due to their absence in model organisms, the authors supposed that the lower coverage of protein domains in these organisms in their analysis was due to the absence of relevant domains for appropriate comparisons to be made. The pufferfish Takifugu rubripes on the other hand had the highest level of coverage (94%) followed by the human genome at 85%. The number of distinct domains identified in this study totaled 34,778 of which only 33 domains were common to all the eukaryotes studied. Also, 22,241 of these domains were found in a single genome! Approach To anlayze the domains efficeintly, the authors adopted a binary combination set in which for example, a protein containing domains A,B, C and D were represented as binary set A~B,B~C,and C~D to help preserve the domain order. This approach led the authors to infer that the general number of domain combinations between organisms remained fairly constant with only a few exceptions. The number of combinations observed is shown in figure 2. This data also suggested that the number of domains an organisms possessed was directly proportionate to its organizational complexity. To correct for the exponential growth between organisms, the authors used an algorithm that takes into account the average ratio between the sum of the number of distinct domains and the sum of those same domains squared. However, this normalization did not reduce the disparity in the number of domains observed when fungi, land plans and green algae were compared to ther other organism groups (figure 3). Since the supergroup, Opisthokonta, is overepressented in the genome analyzed in this study, it is not surprising that the 57% of the unique domains observed were found in this group (Figure 4A). The greatest numbers of genomes analyzed however where the fungal genomes yet the fungi specific domains discovered made up 14% of the total number of distinct domains.Chordates were found to have the largest number of species specific domains as a whole within the Holozoa which belong to the Opisthokonta group (Figure 4B) with a significant percentage of these unique domains found in the choanoflagellates (Figure 5). Parallel evolution By employing unweighted parsimony analysis for the determination of the evolutionary inheritance of the domains among the species, Zmasek and Godzik showed that a number of the domains unique to a specific group of organisms (eg. bilaterean animals) was also found in another group (green algae) at different evolutionary times. Suggesting that they were separate parallel evolutionary events and not passed down from the last common ancestor between the organisms under study (Figure 5). One other example of parallel evolution of protein domains was found in the Amidohydrolase~ aspartate/ornithine carbamoyltransferase domain combinations of Dictyostellium and Metazoans which occurred independently. In metazoans, this combination event resulted in a multifunctional protein in mammalians (Figure 6). References # http://www.broadinstitute.org/education/glossary/transposable-elements #http://www.nature.com/scitable/topicpage/transposons-the-jumping-genes-518 #http://waynesword.palomar.edu/lmexer3.htm #http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Transposons.html #Federoff NV (2012) Transposable Elements, Epigenetics and Genome Evolution. Science 338 (6108): 758-767 http://www.sciencemag.org/content/338/6108/758 #Zmasek CM and Godzik A (2012) This Deja Vu Feeling- Analysis of Multidomain Protein Evolution in Eukaryotic Genomes. PLoS Comp. Biol. 8(11) Open Access http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002701 #Feschotte C and Pritham EJ (2007) DNA transposons and the evolution of teh eukaryotic genome. Annu Rev Genet 41:331-68 http://www.ncbi.nlm.nih.gov/pubmed/18076328